178 research outputs found
Ridgelet-based signature for natural image classification
This paper presents an approach to grouping natural scenes into (semantically) meaningful categories. The proposed approach exploits the statistics of natural scenes to define
relevant image categories. A ridgelet-based signature is used to represent images. This signature is used by a support vector classifier that is well designed to support high dimensional features, resulting in an effective recognition system. As an illustration of the potential of the approach several experiments of binary classifications (e.g. city/landscape or indoor/outdoor) are conducted on databases of natural scenes
Pre-classification for automatic image orientation
In this paper, we propose a novel method for automatic orientation of digital images. The approach is based on exploiting the properties of local statistics of natural scenes. In this way, we address some of the difficulties encountered in previous works in this area. The main contribution of this paper is to introduce a pre-classification step into carefully defined categories in order to simplify subsequent orientation detection. The proposed algorithm was tested on 9068 images and compared to existing state of the art in the area. Results show a significant improvement over previous work
Image metadata estimation using independent component analysis and regression
In this paper, we describe an approach to camera metadata estimation using regression based on Independent Component Analysis (ICA). Semantic scene classification of images using camera metadata related to capture conditions has had some success in the past. However, different makes and models of camera capture different types of metadata and this severely hampers the application of this kind of approach in real systems that consist of photos captured by many different users. We propose to address this issue by using regression to predict the missing metadata from observed data, thereby providing more complete (and hence more useful) metadata for the entire image corpus. The proposed approach uses an ICA based approach to regression
Learning midlevel image features for natural scene and texture classification
This paper deals with coding of natural scenes in order to extract semantic information. We present a new scheme to project natural scenes onto a basis in which each dimension encodes statistically independent information. Basis extraction is performed by independent component analysis (ICA) applied to image patches culled from natural scenes. The study of the resulting coding units (coding filters) extracted from well-chosen categories of images shows that they adapt and respond selectively to discriminant features in natural scenes. Given this basis, we define global and local image signatures relying on the maximal activity of filters on the input image. Locally, the construction of the signature takes into account the spatial distribution of the maximal responses within the image. We propose a criterion to reduce the size of the space of representation for faster computation. The proposed approach is tested in the context of texture classification (111 classes), as well as natural scenes classification (11 categories, 2037 images). Using a common protocol, the other commonly used descriptors have at most 47.7% accuracy on average while our method obtains performances of up to 63.8%. We show that this advantage does not depend on the size of the signature and demonstrate the efficiency of the proposed criterion to select ICA filters and reduce the dimensio
Learning Finer-class Networks for Universal Representations
Many real-world visual recognition use-cases can not directly benefit from
state-of-the-art CNN-based approaches because of the lack of many annotated
data. The usual approach to deal with this is to transfer a representation
pre-learned on a large annotated source-task onto a target-task of interest.
This raises the question of how well the original representation is
"universal", that is to say directly adapted to many different target-tasks. To
improve such universality, the state-of-the-art consists in training networks
on a diversified source problem, that is modified either by adding generic or
specific categories to the initial set of categories. In this vein, we proposed
a method that exploits finer-classes than the most specific ones existing, for
which no annotation is available. We rely on unsupervised learning and a
bottom-up split and merge strategy. We show that our method learns more
universal representations than state-of-the-art, leading to significantly
better results on 10 target-tasks from multiple domains, using several network
architectures, either alone or combined with networks learned at a coarser
semantic level.Comment: British Machine Vision Conference (BMVC) 201
Detecting the presence of large buildings in natural images
This paper addresses the issue of classification of lowlevel
features into high-level semantic concepts for the purpose of semantic annotation of consumer photographs. We adopt a multi-scale approach that relies on edge detection to extract an edge orientation-based feature description of the image, and apply an SVM learning technique to infer the presence of a dominant building object in a general purpose collection of digital photographs. The approach exploits prior knowledge on the image context through an assumption that all input images are ïżœoutdoorïżœ, i.e. indoor/outdoor classification (the context determination stage) has been performed. The proposed approach is validated on a diverse dataset of 1720 images and its performance compared with that of the MPEG-7 edge histogram descriptor
AVAE: Adversarial Variational Auto Encoder
Among the wide variety of image generative models, two models stand out:
Variational Auto Encoders (VAE) and Generative Adversarial Networks (GAN). GANs
can produce realistic images, but they suffer from mode collapse and do not
provide simple ways to get the latent representation of an image. On the other
hand, VAEs do not have these problems, but they often generate images less
realistic than GANs. In this article, we explain that this lack of realism is
partially due to a common underestimation of the natural image manifold
dimensionality. To solve this issue we introduce a new framework that combines
VAE and GAN in a novel and complementary way to produce an auto-encoding model
that keeps VAEs properties while generating images of GAN-quality. We evaluate
our approach both qualitatively and quantitatively on five image datasets.Comment: pre-print version of an article to appear in the proceedings of the
International Conference on Pattern Recognition (ICPR 2020) in January 202
- âŠ